perm filename A15[106,RWF] blob sn#749336 filedate 1984-04-03 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00004 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	←Test Files←
C00013 00003	More Dynamic Programming.
C00018 00004	←Sorting/Programming Economics←
C00029 ENDMK
C⊗;
←Test Files←

In testing a program which will use a large data file, such as a dictionary
for a spelling checker program, use a much smaller test file, for several
reasons:

(1) Your testing will take less time.

(2) You will be able to predict exactly what the program should do.

(3) You can easily change the file to exercise all parts of the program.
(Long words, words which come after the last dictionary entry,....)

More Dynamic Programming.

I play a simple dice game with an opponent: he rolls two dice, then I roll
two dice, and we alternate until one of us has a total of 25 or more.  He
offers to bet three to one that he will win.  I should accept if my winning
chance is more than 0.25, and decline if less.
(This problem is a simplification of problems arising in backgammon endgames).

I can get a tolerable approximation of my chances by playing the game by
myself a few hundred times, but if the actual chance is within a few
percent of 0.25, I would need to play perhaps a thousand times to avoid a
significant chance of making the wrong decision.  I can program a computer
to make the same experiment, but it is easier, faster, and more reliable to
have it calculate the exact probabilities by another application of the
dynamic programming method.  Let f(A,B) be the chance for the first player
to win when he needs a total of A to win, and the second player needs a
total of B to win.  If A is zero or negative, f(A,B) = 1; if B is zero or
negative, f(A,B) = 0.  Otherwise, the first player rolls two dice, getting
numbers i and j between 1 and 6, and the second player now gets a turn, so
the second player's chance is f(B,A-i-j).  The two player's chances total
1.00, so the first player's chance after rolling the dice is 1-f(B,A-i-j).
The first player's chance before rolling the dice is found by averaging
over all the possible dice rolls; it is 

	1 - 1/36 (sigma) [1<=i<=6, 1<=j<=6] f(B, A-i-j)
			↑

I can calculate f(A,B) for each value of A and B up to 25, and save each in
a table as I go, for use in calculating f for larger values of A and B.  I
must be sure, though, that when I calculate f(A,B) I have already
calculated f(B,A-2) through f(B, A-12);  this can be calculated if I do the
calculation in the order

f(1,1)
f(2,1), f(1,2), f(2,2)
f(3,1), f(3,2), f(1,3), f(2,3), f(3,3)
f(4,1), f(4,2), f(4,3), f(1,4), f(2,4), f(3,4), f(4,4)
etc.

(Another satisfactory order would be by increasing sums of A and B)

The algorithm can be summarized this way:

	Initialize;

	FOR LARGER:= 1 TO 2.5 DO
		BEGIN
		FOR K:=1 TO LARGER-1 DO
			GETF(LARGER, K);
		FOR K:=1 TO LARGER DO
			GETF(K, LARGER)
		END;

	WRITE F[25,25]

where the procedure GETF(A,B) has this body:

	BEGIN
	SUM:=0,
	FOR I:=1 TO 6 DO
		FOR J:=1 TO 6 DO
			SUM:=SUM + F[B,A-I-J];
	F[A,B]:=1.00 -SUM/36.0
	END

Exercise:

Which values of the array F need to be initialized?  What  should the
subscript bounds on F be?

The theme of this example is that the easiest way to compute a function for
one assignment of argument values may be to compute a complete table of the
function. 

←Sorting/Programming Economics←

I want to write a program to read N numbers, and put them into array A in
increasing order.  In practice, such a program would more likely be reading
strings and using alphabetical order, but using numbers keeps the algorithm
simple.  If the numbers arrive in the order
	388 909 476 820 245 371
the successive contents of the variables used by the program will be

	I	A[1]	A[2]	A[3]	A[4]	A[5]	A[6]
----------------------------------------------------------------
	0	?	?	?	?	?	?
	1	388	?	?	?	?	?
	2	388	909	?	?	?	?
	3	388	476	909	?	?	?
	4	388	476	820	909	?	?
	5	245	388	476	820	909	?
	6	245	371	388	476	820	909

Each successive arrival is inserted inot the array by moving the previous
numbers to the right, starting with the rightmost ones, until a number is
found which is not larger than the new one.  In Pascal

	FOR I:= 1 TO N DO
		BEGIN
		READ(NEWDATUM);
		J:=I;	(*CANDIDATE LOCATION FOR NEWDATUM*)
		WHILE(NEWDATUM < A[J-1]) AND (J>1) DO
			BEGIN
			A[J]:=A[J-1];
			J:=J-1;
			END;
		A[J]:=NEWDATUM
		END

Exercise:  In the above algorithm, nothing is ever placed in A[0], but that
variable must exist.  Why?

The inner iteration can be simplified slightly by assuming that the number
in a[0] is not greater than the new datum.  In effect, the small number in
A[0] serves as a sentinel to stop decreasing J so that the test for J > 1
is no longer needed.  The above algorithm is preceded by A[0]:=SMALL, where
SMALL is the smallest possible constant, and the inner iterative clause
becomes WHILE NEWDATUM < A[J-1] DO

The sorting method given above is called sorting by insertion.  It is a
very good algorithm if N is no more than 10 or 15.  If a program uses it
many times with a large value of N, it may be worthwhile to use a more
efficient algorithm.  To make the economically correct choice, you need to
calculate the cost of the wasted computer time, and the cost of the
programming time to use a better algorithm.

When N is large, other algorithms can sort many times faster, e.g. more
than ten times faster if N=1000, so almost the entire running time is
wasted.  the insertion sort algorithm, on the average, moves the average
datum half way through a sequence of N/2 previous data, so the number of
times through the inner iteration is about N x 1/2 x N/2 = N↑2/4.  There
are about ten machine operations in the inner iteration, each taking up
about 10↑-6 seconds on my computer, for a total time of 10↑-5 x n↑2/4
seconds.  If computer time costs $0.25 per second, and N is 5000, the cost
is about $15 for each sorting job.  If using a better algorithm requires
two hours work for a programmer paid $25 per hour, the cost of using a
better algorithm should be used if the sorting will be done four times or
more. (Remember, in estimating programmer time, to include debugging and
documentation as well as time to find or design the algorithm)

The typical reader of this book is a student programming problems chosen to
use insignificant amounts of computer time, so the emphasis here is on
simplicity and reliability rather than efficiency.

A professional approach to programming, though, includes awareness of the
tradeoffs between labor and machine costs, and willingness to investigate
the extensive research results on program efficiency.  For our example of
sorting, Knuth's "Searching and Sorting" is a definitive study.  The
program libraries for many computers include excellent sorting programs.